Dyadic Classification Trees via Structural Risk Minimization
نویسندگان
چکیده
Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems. This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance is attained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical results on structural risk minimization, on which the pruning rule for DCTs is based.
منابع مشابه
Multivariate Dyadic Regression Trees for Sparse Learning Problems
We propose a new nonparametric learning method based on multivariate dyadic regression trees (MDRTs). Unlike traditional dyadic decision trees (DDTs) or classification and regression trees (CARTs), MDRTs are constructed using penalized empirical risk minimization with a novel sparsity-inducing penalty. Theoretically, we show that MDRTs can simultaneously adapt to the unknown sparsity and smooth...
متن کاملOracle Inequalities and Adaptive Rates
We have previously seen how sieve estimators give rise to rates of convergence to the Bayes risk by performing empirical risk minimization over Hk(n), where (Hk)k ≥ 1 is an increasing sequence of sets of classifiers, and k(n) → ∞. However, the rate of convergence depends on k(n). Usually this rate is chosen to minimize the worst-case rate over all distributions of interest. However, it would be...
متن کاملOracle Bounds and Exact Algorithm for Dyadic Classification Trees
This paper introduces a new method using dyadic decision trees for estimating a classification or a regression function in a multiclass classification problem. The estimator is based on model selection by penalized empirical loss minimization. Our work consists in two complementary parts: first, a theoretical analysis of the method leads to deriving oracle-type inequalities for three different ...
متن کاملData-dependent Structural Risk Minimisation for Perceptron Decision Trees Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150
Perceptron Decision Trees (also known as Linear Machine DTs, etc.) are analysed in order that data-dependent Structural Risk Minimization can be applied. Data-dependent analysis is performed which indicates that choosing the maximal margin hyperplanes at the decision nodes will improve the generalization. The analysis uses a novel technique to bound the generalization error in terms of the marg...
متن کاملMinimizing Structural Risk on Decision Tree Classification
Tree induction algorithms use heuristic information to obtain decision tree classification. However, there has been little research on how many rules are appropriate for a given set of data, that is, how we can find the best structure leading to desirable generalization performance. In this chapter, an evolutionary multi-objective optimization approach with genetic programming will be applied t...
متن کامل